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ABSTRACT 

Macro tree transducers (mtts) are a useful formal model for 
XML query and transformation languages. In this paper 
one of the fundamental decision problems on translations, 
namely the "translation membership problem" is studied for 
mtts. For a fixed translation, the translation membership 
problem asks whether a given input/output pair is element 
of the translation. For call-by-name mtts this problem is 
shown to be NP-complete. The main result is that trans- 
lation membership for call-by-value mtts is in polynomial 
time. For several extensions, such as addition of regular 
look-ahead or the generalization to multi-return mtts, it is 
shown that translation membership still remains in PTIME. 

1. INTRODUCTION 

Macro tree transducers (mtts) [6] are a popular formal model 
for XML query and transformation languages (cf., e.g., [4] 
1131115] . They are powerful enough to represent a wide range 
of practical transformations, and they subsume various well- 
known models of tree translations such as attribute gram- 
mars, MSO-definable tree translations 2], or pebble tree 
transducers [16]. Yet, mtts have many decidable properties 
such as exact typechecking or emptiness and finiteness and 
membership of their domains and ranges. These make mtts 
a useful device for static verification of XML translation 
programs. 

In the algorithms that decide such properties, we sometimes 
encounter as a sub-problem the "translation membership 
problem" [11] . For a fixed translation, the translation mem- 
bership problem asks whether a given input/output pair 
is element of the translation. Although the problem itself 
seems simple, it is far beyond trivial to solve the problem ef- 
ficiently, in particular if we consider nondeterministic mtts. 
Nondeterminism is useful when using the mtt to approxi- 
mate the behavior of a "real" (Turing-complete) program- 
ming language (viz. a complicated if-then-else expression; it 
is translated into an mtt that nondeterministically chooses 
one of the conditional branches). Depending on the order of 
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evaluation, there are two different models of nondeterminis- 
tic mtts, namely, call-by-value (also called inside-out or IO 
for short) and call- by-name (outside-in or OI). Note that in 
the limit, to one given input tree of size n an mtt can as- 

sociate at most 2 -many different output trees, if the mtt 
operates in OI mode. In contrast, the limit for mtts in IO 
mode is at most 2 2 different output trees for a given input 
tree of size n. Consider the following four rules of an mtt. 

start (a(a;i)) — > double(a;i, double(a;r, e)) 
double(a(xi), j/i) — » double(xi, double(a;i, j/i)) 
double(e,yi) -> f (j/i, j/i) | g(yi, J/i). 

For an input tree of the form s n — a(a(- ■ ■ a(e) ■ ■ ■ )) with n 
a-nodes, this mtt generates a full binary tree of height 2™ 
(and thus of size 2 2 ). If the mtt operates in OI derivation 
mode, then each node of the binary output tree is nondeter- 

ministically labeled either f or g; thus, there are 2 -many 
output trees associated to the input tree s n . If, however, 
the mtt with the same rules operates in IO derivation mode, 
then for input s n it generates only 2 2 many different out- 
put trees (the nodes on one level of an output tree all have 
the same label). Thus, mtts in OI derivation mode (call- 
by-name) have "much more" nondeterminism than mtts in 
IO derivation mode (call- by- value). This difference suggests 
that translation membership is computationally harder for 
Ol-mtts than for IO-mtts. 

In this paper, we first show that for Ol-mtts, translation 
membership is NP-complete, and so is for compositions of 
multiple IO-mtts (Section [3J. We then present our main re- 
sult: translation membership for IO-mtts is solvable in poly- 
nomial time (Section [4}. Our algorithm for IO translation 
membership is based on a technique called inverse type infer- 
ence. For an mtt M and a given output type, i.e., a regular 
tree language L of output trees, inverse type inference con- 
structs a description of the corresponding input type, i.e., of 
the regular tree language M~ 1 (L). Note that, inverse type 
inference basically takes exponential time, because the size 
of the inverse-type automaton itself can be that large |16l 
1151 117] . To avoid this, we construct the automaton on-the- 
fly and obtain the PTIME efficiency. Our technique is then 
generalized to several extension of IO-mtts, such as addition 
of regular look-ahead or the generalization to multi-return 
mtts. In fact, we even consider a more powerful look-ahead 
mechanism that is based on tree automata with equality and 
disequality constraints between siblings [TJ. 

Note that, for total deterministic mtts: OI equals IO, and 



by Theorem 15 of |12] . given an input tree s, the output 
tree r(s) can be computed in time 0(\s\ + |t|), even for an 
n-fold composition of total deterministic mtts. Hence, by 
simply computing the output, translation membership can 
be solved in linear time for this class of translations. The 
result can easily be extended to deterministic but partial 
mtts (in either 10 or 01 derivation mode), as mentioned at 
the end of Section [4] 

2. DEFINITIONS 

For a finite set A, we denote by \A\ the number of its el- 
ements. A finite set E with a mapping rank : E — > N is 
called a ranked alphabet. We often write cr ( - fe - t to indicate 
that rank(a) — k and write E^' to denote the subset of 
E of rank-fc symbols. The product of E and a set B is the 



ranked alphabet E x B = {{a,b) {k) | <r (fc) G E, b G B}. 
Throughout the paper, we fix the sets of input variables 
X = {xi, X2, ■ ■ ■ }, parameters Y — {2/1,3/2, ■•■ }, and let- 
variables Z = {21,22,...}, which are all of rank 0. We 
assume any other alphabet to be disjoint with X, Y, and 
Z. The set Xi is defined as {x\, . . . , Xi}, and Yi and Zi are 
defined similarly. 

The set Te of trees t over a ranked alphabet E is defined 

k 



by the BNF t ::=a(£7.~t) for a G E (fc) . We often omit 
parentheses for rank-0 and rank-1 symbols. We recursively 
define the function label from Te x N* to E as follows. 
For t = a(ti,...,tk), cr (fc) G E, ft > 0, and ti,...,t k G 
Ts, label(t,e) = a and label(t,i.v) = label(ti,v). Thus, 
the empty list e denotes the root node and v.i denotes the 
i-th child of v. We define the set posit) — \y G N* | 
label(t, v) is defined}. We denote by \t\ the number of nodes 
in the tree t. For a node v of t, t\ v denotes the subtree of 
t rooted at the node v. For trees t,t\,...,t n G Te and 
cti, . . . , <r n G Y, (0 \ we denote by t [ffi/ti, . . . , a n /t„] the si- 
multaneous substitution of the o~i by the 

Let E and A be ranked alphabets. A relation r C Te x Ta 
is called a tree translation (over E and A) or simply a trans- 
lation. We define range(r) — {b \ 3a : (a,b) G r}. For two 
translations t\ and T2, their sequential composition T\\Ti 
("n followed by r 2 ") is the translation {(a,c) [ 36 : ((a, b) G 
ti, (b, c) G T2)}. For two classes Ti and T2 of translations, 
we define Ti ; T 2 = {ti;t 2 n G Ti, r 2 G T 2 }. The fc-fold 
composition of the class T of translations is denoted by T k . 



Definition 1. A macro tree transducer (mtt) M is a tuple 
(Q, E, A, qo, R), where Q is the ranked alphabet of states, E 
and A are the input and output alphabets, go G Q' ' is the 
initial state, and R is the finite set of rules of the form 



(q,o-(xi 



,3/mJ 



where q G Q^ m ', a G E' fe ', and r is a tree in TAu(i3xx t )uy m - 
Rules of such form are called (g, cr)-rules, and the set of right- 
hand sides of all (q, er)-rules is denoted by R q , a . We define 
the size of the mtt by \M\ = J2{\ r \ I r £ Rq,<?> q G Q, cr G E}. 



For the remainder of this section, let M be an mtt as in 
Definition [1] A state q of a macro tree transducer can be 



regarded as a (nondeterministic) function in functional pro- 
gramming languages. Depending on the order of evaluation, 
two different semantics can be considered: call-by-value (or 
inside-out, 10) and call-by-name (or, outside-in, OI). Let 
H G {IO,OI}. For the tree u G TAu(QxT E )uyi its meaning 
with respect to M fujff C Tauy is inductively defined as 
follows 
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, t' m ) I *< € (t< (Li, . . . , L re ) j for all i}. 

The difference of IO- and 01- semantics lies in the interpre- 
tation of state calls. In IO-semantics we use IO-substitution 
for parameters; each parameter yi is bound to some fixed 
(but nondeterministically chosen) tree in [«iJio, an d every 
occurrence of yi is replaced with the same single tree. On the 
other hand, in Ol-semantics, each parameter is bound to the 
set of trees [itjJoi> an< i & t every occurrence of yi we nonde- 
terministically choose some tree in [W[oi> independent from 
the choices made at other occurrences of yi. 



As an example of the definition of [u] M , consider the example 
from the Introduction. Note that there we used slightly dif- 
ferent notation: the right-hand side double(a;, double(a;, e)) 
is now written as (double, a;i)((double, xi)(e)), i.e., we dis- 
tinguish the first parameter — which is the special parameter 
that is bound to an input tree in Te — from others bound to 
output trees in Ta, by enclosing it with angle brackets. Now, 
let us compute [(start, a(a(e)))] M . 

[(start, a(a(e)))] M 
= [(double, a(e)) ((double, a(e))(e))] M 
= [(double, e) ((double, e)(2/i))] M «- [(double, a(e))(e)]„ 

M 

= ({f (3/i,yi),g(yi,2/i)} «— [(double, e)(e)] M ) 

*- [(double, a(e)>(e)] M 

= ({f (yi,2/i),g(2/i,yi)} <— {f(»i, 2/i),g(2/i,2/i)}) 

*- [(double, a(e)>(e)] M 

M 

Here, we encountered the /i-substitution L < — L for L = 
{f(2/i»2/i).g(3/i»3/i)}- Now, if/i = IO then L <— L = {f(f(j/i, 

3/1)^(3/1,3/1)). g(f(yi, 3/1), f (2/1,3/1)). f(g(3/i.2/i),g(2/i.i/i)). 



g(g(l/i>3/i)>g(l/ij2/i))}; the size of the set is 2 x 2 = 4. 
On the other hand, if /i = 01 then we obtain L < — L = 

{f(f(j/i,j/i),f(yi,2/i)), f(f(yi,yi),g(yi,yi)), f(g(yi,yi), 
f(g(i/i.2/i),g(2/i,2/i))» g(f(yi,j/i),f(yi,yi)), 
g(f(j/i J 2/i) J g(2/i,2/i))> g(g(yi,yi),f(yi,yi)), g(g(j/i>3/i), 

g(2/i, 2/i))}; the size is 2 x 2 2 = 8 where the exponent 2 
comes from the number of occurrences of the parameter 2/1 
in each target term of the substitution. 

We define the translation realized by M in /i-mode by the 
relation r M , M = {(«,*) £ Te X 7a | £ £ [(go, s)] M }. The class 
of all translations realized by all mtts in ^t-mode is denoted 
by MTT^,. An mtt is called deterministic (respectively, to- 
tal) if for every g, a, the number of rules \R q , a \ is at most 
(at least) 1; the corresponding classes of translations are de- 
noted by prefix D (t). An mtt is called linear (in the input 
variables) if in every right-hand side of the rules, each input 
variable Xi appears at most once; the corresponding class of 
translation is denoted by prefix L. For example, the class of 
translations realized by linear, deterministic, and total mtts 
in 01 mode is denoted by LDtMTToi- 

For a translation tCTjX Ta, the translation membership 
problem for r is a decision problem that determines, given 
a tree s £ Te and a tree t £ Ta, whether (s,t) £ r. In 
the rest of the paper, we focus on the data complexity of 
this problem. That is, we measure the complexity in terms 
of \s\ + \t\, regarding the translation r to be fixed. We will 
always assume that the input and output tree that are inputs 
to the problem are denoted by "a" and "t" . 



3. NP-COMPLETE CLASSES 

The first result is that translation membership for Ol-mtts 
is NP-hard, even for linear mtts. The proof is based on 
the reduction to 3-SAT, which resembles |18j which shows 
NP-completeness of the membership problem for indexed 
languages. In fact, the indexed languages can be obtained 
as yields (strings of leaves from left to right) of output lan- 
guages of linear mtts (by the fact that each indexed lan- 
guage is the yield of some OI context-free tree language [7] 
and each 01 context-free tree language is equivalent to the 
range range(r) of some r £ LMTToi by Corollary 6.13 
in [6]). However, given a word w as input for the member- 
ship problem of an indexed language L, it is not clear how 
to construct a pair (s,t) such that (s,t) £ roi.M for some 
linear mtt if and only if w is in L. We can choose s — a n 
with n — length(w) and an LMTT which produces trees t 
which have as yield the word w. But how to select such 
a tree t as input for the translation membership problem? 
Note that it is easy to construct from w an input for trans- 
lation membership for a two-fold composition of mtts: the 
second transducer realizes "yield", i.e., it turns a tree t into 
a monadic tree that represents t's yield (such a transducer is 
even total deterministic). Thus, it follows that translation 
membership for two-fold compositions of mtts is NP-hard. 
This was mentioned already in [11] . The next lemma shows 
that even translation membership for a single linear mtt is 
NP-hard. 



Lemma 1 Translation membership for LMTToi (and hence 
MTTqi) is NP-hard. 



Proof. We construct an mtt M = (Q, go, S, A, R) so 
that it generates the parse-trees of all satisfiable boolean 
formulas in 3-conjunctive normal form, given the number of 
variables n and clauses m as the inputs. We slightly abuse 
our notation and write y v ,yt,yf in place of 2/1,1/2,3/3, respec- 
tively. Let Q = {g< 0) , g?\ g< 3 >}, E = { a « , b< 3 > , , d( >}, 
A = {A (2) ,V (3) ,^ (1) ,v (1) ,e (0) }, and R the following set of 
rules: 

(go,a(xi)) -> (g,a;i)(v(e),e,-.(e)) 
(go,a(xi)) -> (g,a;i)(v(e),-i(e),e) 
(q,b(x ly x 2y x 3 ))(y v ,yt,y f ) -> 

(q,xx)(v(y v ), (qc,x 2 )(yt,yv), (q Cy x 3 ){y s ,-.(?/„))) 
(q,b(x ly x 2y x i ))(y v ,y ty y f ) -> 

(q,xi)(y(y v ), {q a ,x 2 )(yt,-'(y v )), {qc,x 3 )(y fy y v )) 
(g c ,d)(j/i,2/ 2 ) -> 2/1 
(g c ,d)(2/i,2/ 2 ) 2/2 



(q,c(xi))(y v ,y t ,y f ) 
(q,c(xi))(y v ,y t ,y f ) 
(q y c(x 1 ))(y Vy y ty yf) 
(q y c(x 1 ))(y Vy y ty yf) 
(q, c(xi)) (y v ,y t ,y f ) 
(q,c(xi))(y v ,y t ,y f ) 
(q,c(xi))(y v ,y t ,y f ) 



K(V(Vt,Vt,Vt), (q,xi)(y v ,yt,yf)) 
A(v(2/t,2/i,2//), (q,xi)(Vv,yt,yf)) 
A(v(2/t,2//,2/t), (q,xi)(Vv,yt,yf)) 
A(v(2//,2/t,2/t), (q,xi)(vv,yt,yf)) 
A(v(2/t,2//,2//), (q,xi)(y v ,yt,yf)) 

A(V(2//,2//,2/t), (q,xi)(y Vy y ty yf)) 
A(V(2//,2/ t ,2//), (g, xx)(y v ,y t ,yf)) 

v(yt,yt,yf) 



'. (same as the V(- • • ) part of (g, c)-rules) 

(q,&)(yv,yt,yf) -> v(vf,vt,Vf)- 



From an input tree a(b(b(- ■ ■ b( c m d, d, d) • ■ ■ ), d, d)) of size 
3n + m + 2, it generates all satisfiable boolean formulas in 3- 
conjunctive normal form with n variables and m conjuncts. 
The output language encodes boolean formulas as follows: a 
boolean variable pi for < i < n is represented as v'e, and 
three boolean operations -1, A, and V are represented as they 
are. For example, the formula (po V ->p\ Vp 2 ) A (^po Vpi \/p 2 ) 
is encoded as A(V(e, -ive, we), V(-ie, ve, we)). 

Intuitively, when the mtt reads the root node of the input, it 
nondeterministically assigns a truth-value to the first vari- 
able po- The first (go,a)-rule is the case when it assigned 
'true' and the other rule is for 'false'. Three parameters 
are passed to the state q. Intuitively, the first parameter y v 
denotes the name of the next variable to be assigned a truth- 
value. The second (and the third, respectively) parameter yt 
(2//) denotes the set of 'true' ('false') literals (namely, vari- 
ables or negated variables) that have been constructed up 
to now. While reading b nodes in the state g, the mtt non- 
deterministically assigns a truth- value to each variable pi to 
Pn-i, similarly to po- Here, Ol-nondeterminism is crucially 
used to represent arbitrary choice of positive and negative 
literals; each time yt and yj are copied to the output, they 
contain unevaluated "combs" of g c -calls (on d- nodes). Each 
such comb represents the nondeterministic choice of any of 
the positive (y t ) or negative (2//) literals that have been 
generated so far. The state g c means a union of two sets, 
by taking two parameters and nondeterministically returns 



either one of them. The parameter yt is assigned an uneval- 
uated expression, e.g., like (q c ,d)((q c ,d)(^po,pi),P2), and 
each time the value of y t is needed, it is nondeterministi- 
cally evaluated to either -^po, pi, or p 2 . Then, while reading 
c nodes in the input, the transducer generates m conjunc- 
tions of 'true' clauses. Since we generate 3-CNF formulas, 
each clause consists of a disjunction of exactly three literals. 
There are seven possibilities (all combinations of yt and yf, 
except V(y/, y/, 3//)), which are generated by the (q, c)-rules 
of the transducer. 

It should be clear for the reader that this mtt generates all 
(and only) satisfiable 3-CNF formulas; it nondeterministi- 
cally constructs any of the 2" possible assignments to the 
variables po, ■ ■ ■ ,p n -i, and under each assignment, gener- 
ates any of the possible 7 m types of 'true' formulas. The 
point is, the choices at (q c ,d) for enumerating all possible 
literals are nondeterministically evaluated each time gener- 
ating a disjunct, while the choices at (go, a) and (5, b) for 
enumerating all possible truth- value assignments are evalu- 
ated and uniformly determined prior to the generation of all 
conjuncts. 

It is also obvious that, given any 3-CNF formula, we can in 
polynomial time encode the formula to the above explained 
encoding to obtain t, and count the number of variables 
and clauses to obtain s. Then, (s,t) G tm if and only if 
the original formula is satisfiable. It is well known that the 
satisfiability of 3-CNF is NP-complete (see, e.g., [§])■ □ 

In [11], we have proved two closely related results; one is that 
the above NP-hard lowerbound is tight, i.e., the translation 
membership for LMTT01 can be determined in NP time 
complexity. The other is that the complexity of membership 
problem of the output language is in NP, even for finitely 
many compositions of MTToi's. Altogether, we have the 
following theorem. 

Theorem 2 Translation membership for MTTqi for n > 1 
is NP-complete. 

Proof. NP-hardness follows from the preceding lemma. 
Let r £ MTTqi- We can easily construct a translation 
t' = {(s,n(s,t)) I (s,t) G r} in MTTqi where n is a new 
binary symbol. This is done by changing the first mtt Mi 
(with input alphabet E and initial state qo) of the compo- 
sition as follows. Replace for a G E' fe ' every (qo, a)-rule 
with right-hand side t by the new rule (qo,cr(xi, . . . , a;*.)) — ► 
ir(a((qid,Xi), ■ ■ ■ , {qid,%k)),t) and introduce (q id , &(xi, ... , 
Xk)) -> a((q id ,xi), ... , (qid,x k )) for the new state q ld of rank 
0. Then, the subsequent mtts Mi (2 < i < n) are augmented 
by the new rule (q , tt(xi, x 2 )) — » K((qid, Xi), (qo, X2)) and 
qid rules as for Mi. Note that (s,i) G r if and only if 
n(s,t) G range(r'). Since by Theorem 8 of [11] the com- 
plexity of the membership test of ranged') is in NP, we can 
also check (s,t) G r in NP. □ 

Note that compositions of two MTTio's can simulate all 
MTT01 translations (Theorem 6.10 of [B]), and conversely, 
compositions of MTTio's can be simulated by compositions 
MTToi's (Theorem 7.8 of 6 ]). Therefore, we now have the 
NP-completeness for compositions of MTTio's. 



Corollary 3 Translation membership for MTTp Q for n > 2 
is NP-complete. 

4. TRACTABLE CLASSES 

In this section, we first prove that IO-mtts have polynomial- 
time translation membership, contrary to OI-mtts. Then we 
extend the result to several other extensions of IO-mtts, and 
to some restricted subclasses of OI-mtts. 

The idea of the proof is based on inverse type inference for 
mtts M (Theorem 7.4 of [6]); given a finite tree automaton 
B (accepting output trees), we can effectively construct a fi- 
nite tree automaton that recognizes the corresponding input 
trees (L(B)). Given an output tree t, by constructing its 
minimal dag representation (i.e., the pointer representation 
of t such that all isomorphic subtrees are shared), we can 
simply consider it as the trivial deterministic automaton Bt 
with at most |t|-many states which recognizes {t}. Once 
we have constructed the automaton A for t^ 1 (L(Bt)), we 
merely need to check whether s G L(A), in order to solve 
translation membership for (s,t). However, the automaton 
A can be very large: its worst case number of states is ex- 
ponential in \Bt\- Thus, we must avoid to fully construct A 
in order to obtain PTIME complexity. Our idea is to con- 
struct A on demand, while running it on the tree s. Note 
that inverse type inference of an IO-mtt constructs an input 
type automaton which has states that are functions p from 
Q to (V m -> 2 V ) where V is the set of states of B t , Q is the 
set of states of M, and m is the maximum rank of states in 
Q. Such a state p tells us for each q G Q, which state of Bt 
is obtained if we apply the state q to an input tree. That is, 
if A reaches the state p after reading a tree s, it means that 
running Bt on output trees in (q, s)(t\ vi , . . . , t\ Vm ) obtains 
the states (p(<?))(«i, . . . ,v m ). 

Theorem 4 Let M be an mtt. Translation membership for 
tio.jvi can be determined in time 0(\s\ ■ \t\ 2rn+2 ■ \M\) where 
m is the maximum rank of M's states. 

Proof. Let td ag be the minimal dag representing t. It is 
folklore that tdag can be computed in amortized linear time 
in \t\, using hashing, and even in linear time using pseudo 
radix sorting, see [3J. Let Vt be the set of nodes of t iag . We 
define label(v) to denote the label in E of the node v € Vt, 
and child(v, i) to denote the i-th child node of v. Assuming 
a standard pointer structure representing dags, we regard 
each execution of label and child takes O(l) time. 

Let _L be an element distinct from Vt. Let V = Vt U {_L} 
and label(-L) to be undefined. Let run : Te — > A with 

A = 2^* Q l >xV xV be the function defined inductively as 
follows 

run(a(s 1 , s k )) = tr(a, run(si), run(s k )) 

where tr is defined below. The set A contains the states 
of the deterministic bottom-up automaton of r -1 (£), tr is 
the transition function, and run computes the run of the 
automaton. The intuition of the set of states A is, that 
"(q, v, v') G run(s')" means that "if q is applied to the input 
subtree s' with output subtrees rooted at v as parameters, 
then it may generate an output subtree rooted at «"'. The 
special value _L G V is used to denote a tree that is not 



a subtree of t. That is, for example, "(g, v, _L) S ron(s')" 
means that an application of g to s' with parameters « may 
yield a tree that is not a subtree of 

The transition function tr : x A 1 ) — > A is defined as 

follows 

fr(<j,S) = G 1JQ W xV l xV 

i 

| 3r G R q , a : fo,a(r, u')} 

where /^a : 2Au(Qxx)uy x V" — » {true, false} is defined 
inductively on right-hand sides of the rules: 

fv,a(yi, v) = true if v = i; 4 

fv,s(yi, v') = /a/se if w' / 
U,s,(S(ri, . . .,r„),v') = 

label(v') = 5a y\ fv,a( r i, child (v' ,i)) if v' G Vt 

l<i<n 

/ff,a(«(n,..-,rn),±) = (3m G V" : /\ /^(n, «i))A 

l<i<n 

(Vu G Vt : -^(label(u) = S A /\ child(u',i) / «»)) 

l<i<n 

fa,u{{<l ,Xj){ri, ...,r n ),v') = 

3m G V" 1 : ((g',u,?/) G Oj A /\ f^,s(n, Mi)). 

l<i<n 

The relation fs,a{f, v') should be understood as: "evaluation 
of r will yield the output subtree at v', under the assumption 
that the parameters y are bound to v and the effects of 
application of a state to each child is as described by a". 

For a tree t' G T A , let p(t') be v G V t if t' = t\ v , and 
p(t') = _L otherwise. We also define p{T) for T C T A as 
{p(i) | t G T}. The correctness of the above construction 
is verified by the following claim. Note that the claim is 
just rephrasing the intuition of the set of states A explained 
above, in a formal way. 

Claim For every input tree s', we have the following equa- 
tion for all q G Q, ri G Tau(QxT e )j an d an environment 

T: p(l(q,s')( ri ,...,r n )l&) = {»' j («, .-.,»„), «') G 
nm(s'),«i S pdnJro) for all i| 

By applying the claim for q — go and s' = s, we know that 
t G [(?, s)Jio is equal to (qo,Q,v € ) G run(s) where v e is the 
root node of td ag . Hence, the translation membership can 
be determined by computing the set run(s). 

The proof of the claim is by nested induction first on struc- 
ture of s' , and then on the structure of right-hand sides of 
the rules. Let s' = cx(si, . . . , Sk) (the base case is the case 
k — 0). By definition of the IO-semantics we have 

p(l(l,s')(ri, . . . ,r„)]jo) = (J {p(t'[y 1 /t 1 ,...,y n /t n }) | 

r£R q ,<, 

t' G lr[S/g\l&,U G [njxg for all t} 



and by definition of run, we have 

{V I (g,v,t/) G run(s'),Vi G /3([r<lro)} 

= U I faA r ' v ')> v i ep([r<lro)} 

where a = (nm(si), . . . , run(sk))- To show these two sets 
are equal, it is sufficient to prove the the following state- 
ment: if p{U) = Vi then {p(t'[j7/*l) | f G {r[x/3\lm} = 
W I fv,s(r, v')}. The proof is by nested induction on the 
structure of r. For example, if r = (q',Xi)(ri, . . . ,r„), we 
have {v' | fv,a{{q' ,Xi){ri, ■ ■ ■ ,r n ),v')} = {v' \ (q',u,v') G 
a i, fv,s{ri,Ui) for all i}, which is by inner induction hypothe- 
sis equal to {V | (q',u,v') € ai, Ui £ p(|rj[a?/s, y/t]] IO ) for all 
i}, and then by outer induction hypothesis it is equal to 

p(l(q',s i )(r 1 [x/s,y/P l ,...,r n [x/s,y/t\)]^) = WW/%) I *' 
G lr[x/s\}^}. The other cases are proved similarly. 

The time complexity for testing (go, (), v e ) G run(s) is com- 
puted as follows. The value run(s) for the whole input tree 
s can be computed by executing the tr function on each 
node of s. The computation is done in bottom-up fashion 
as bottom-up tree automata does, so that the states in a 
are already constructed. The number of execution of the tr 
function is \s\. The set tr(a, a) can be constructed by simply 
testing all combinations of (g, v, v') G Ui xV" 1 xV (which 
is of size < \Q\ ■ |V| m+1 ) and r G R q , a by / Sj3 . Note that 
/^a may receive \r\ • | V| different pairs of arguments, and the 
computation of each value fv,a(r',v') takes 0(|Vj m ) time in 
the worst case (the fv,a({q',%j){' • ■ )) case) assuming the val- 
ues of /^a are already computed for all subexpressions of r' . 
Hence, 0(\r\ ■ |V| m+1 ) time is sufficient here. Note that the 
fv,a{5(- ■ ■ ), -L) case can be computed efficiently in 0(|V|) 
time by remembering the number \{v \ fv,a{r' ,v)}\ for each 
sub-expression r': the existence of u can be checked by veri- 
fying the number is non-zero, and the check child (u , i) ^ m 
is replaced with "either not ftf,a(r' , child(u' ,i)) or the num- 
ber is more than one" . Since it is only required to compute 
the f$ t a(5(- • • ), -L) cases at most \r\ times, the time com- 
plexity for the cases is 0(\r\ ■ \V\), which is subsumed by 
0{\r\ ■ |V| m+1 ). Altogether, multiplying all of them yields 
the desired complexity bound 0(\s\ • \t\ 2m+2 -\M\). Note that 
we have |V| < \t\ + 1 by definition, and that the parameter 
\M\ subsumes ^ q eQ,ren q , a \r\. □ 

The reader may wonder why the same approach does not 
work for Ol-mtts, whose inverses also preserve the regular 
tree languages. The problem is, for OI, the states of the 
inferred automata are in A — 2^ < ^ < >x ' 2 ) xV instead of 
A = 2 UlQ 1 xV%xV . The difference is intuitively explained 
as follows: in IO-mtts, every copy of a same parameter is an 
identical output tree and thus corresponds to a single node 
in V, while in Ol-mtts, each copy is evaluated independently 
and thus may correspond to different output nodes. To cap- 
ture this phenomenon in the inverse type inference, each pa- 
rameter must be represented by a set of nodes rather than 
a single output node. The additional exponential implies 
that a single state in A (a subset of Ui Q W x (2 V ) ! X V) can 
already be exponentially large. Therefore, on-the-fly con- 
struction does not help to obtain a PTIME algorithm. Of 
course, Lemma [T] implies that there is no PTIME algorithm 
for translation membership for Ol-mtts (unless NP=P). 



Nevertheless, some subclasses of OI-mtts still admit PTIME 
translation membership. Note that the essential difficulty of 
Ol-translation membership comes from the copying of pa- 
rameters. Consider, for example, an Ol-mtt that is linear in 
the parameters (i.e., in every right-hand side each parameter 
■jji occurs at most once); then each parameter is either used 
once or is never used. In this case, it can be represented 
in the inverse- type automaton by a set of size < 1. More 
generally, if an Ol-mtt is finite copying in the parameter, its 
translation membership can be tested in polynomial time. 
An mtt is finite copying in the parameter if there exists a 
constant c such that for any q, s, and u G \{q, s)(yi, . . . , ykj\, 
the number of occurrences of yt in u is no more than c; the 
number c is called a (parameter) copying bound by M. Note 
that "linear-in-parameter" mtts are a special case of finite 
copying mtts; they are not only finite copying with copying 
bound 1, but also the finiteness can be known by simply 
counting the number of syntactic occurrences of each vari- 
able in the rules, while finite copying in general is a semantic 
property of mtts. Also note that finite copying is a decidable 
property, and the copying bound can be effectively obtained. 
(See Lemma 4.10 of [5]. Although it is proved only for total 
deterministic mtts, the same technique also works for 10- 
and OI- nondeterministic mtts.) 

Theorem 5 Let M be an mtt that is finite copying in the 
parameters with copying bound c. Then, translation mem- 
bership for toi,m can be determined in time 0(\s\ ■ \t\ c( - 2m+2 ' 1 ■ 
c ■ \M\) where m is the maximum rank of M's states. 

Proof. Let tdag be the minimal dag representing t. Let 
V be the set of nodes of td ag - We define label(v) to denote 
the label in E of the node v G V, and child(v,i) to denote 
the i-th child node of v. 

Let A = 2 U<3 (,) xPcWxv where Vc( yj ={S CV \ \S\< 
c} and the function run be defined as follows: 

run(a(s!, . . . , st)) = tr(cr, run(s-i), . . . , run(sk)). 

The transition function tr : E* ! ' x A 1 ) — > A is defined as 
follows 

tr(a,a) = {(qj,v') G (JQ W x Vc(V)* x V 

i 

] 3r G R q , a : /^ 3 (r, v')} 

where fg-. : TAu(Qxx)ur X V — * {true, false} defined as 
follows: 

fg,z(Vi> v ') = true if v' G Pi 

f0,a(V^ V ') = f alse if V ' & Pi 

f0,a( S ( ri > ■ ■ ■ ' r >0> U ') = A f0,s( ri > Chlld{v', i)) 

l<i<n 

if label(v') = S 

//3,a(<5( ri > • • • > r n ),v') = false if label (v) S 
/ / 3,3((9'> :c j>( r i>---' r ™)' u ') = 

^7 : ((9 ,J; V ') G Oj and for all i and u G 7, : f^ s {ri,u)). 

Note that we do not have the _L element in V this time. 
Instead, the empty set plays the same role. The com- 
plexity of this algorithm is computed similarly to the case 



of IO-mtts: we need to test by fx s all combinations of 

o G Ui <3 W x Vc{Vf x V (which is of size 0(\Q\ ■ |V| cm+1 ) 
this time) and r G R q ,<y, then fs s receives |r| ■ \V\ dif- 
ferent pairs of arguments, and finally the computation of 
h,s(W' x j)(---)) takes 0(|V| cm -c) time where \V\ cm comes 
from the part "37" and c comes from the part "it G 7i" . The 
correctness is shown by proving the following claim. 

Claim For every input tree s', t' G [(<?, s')(«i, . . . , Wn)]oi 
if and only if there exist subtrees ti t i, . . . , ti,^ , . . . , t n ,i, . . . , 
tn,i n of t such that {tj,i, . . ., U tni } C with U < c and 

(«,({p(ti,x), • ■ • , p(ti, h )}, ■ • ■ , {p(t n ,i), • • ■ , p(t n ,ij}),p(t')) G 
run(s'), where p is defined as in the proof of Theorem [4] 

The proof is by induction, too. The finite-copying property 
ensures that in the semantics of the mtt, Ol-substitution is 
done only on parameters yi whose number of occurrence is 
less than or equal to c. It justifies that our algorithm only 
considers sets of size < c as parameter representation. □ 

On the other hand, the PTIME result for IO-mtts can be 
generalized to a more powerful extension of IO-mtts. One 
popular way to extend mtts is by regular look-ahead. Mtts 
with regular look-ahead are equipped with one determin- 
istic bottom-up tree automaton and are allowed to select 
a rule with respect to the state of the tree automaton, in 
addition to the current state and the label of the current 
node. Since any MTTio's with regular look-ahead can be 
simulated by a normal MTT10 (Theorem 5.19 of [B]), the 
translation membership for MTT10 with regular look-ahead 
is also in PTIME. In fact, we can further extend the model 
to use a more expressive model of look-ahead, namely, tree 
automata with equality and disequality constraints 1 , while 
still preserving the PTIME translation membership. 

Definition 2. A bottom-up tree automaton with equality 
and disequality constraints (TAC) is a tuple B = {P, E, S), 
where P is the set of states, E the input alphabet, and 8 
is a set of transitions of the form {a^ m \p\, . . . ,p m ,E, D,p) 
where E,D C {l,...,m} 2 are the sets of equality and dis- 
equality constraints, respectively. A list of trees t\ , . . . , t m 
is said to satisfy the constraints if V(i, j) 6 E : ti = tj and 
G D : U / tj. We define S inductively as follows: 

S(a(t 1 ,...,t m )) = {peP\ 

3(a,p 1 ,...,p m ,E,D,p) G S : 

Pi G 5(U) for all i and t%, . . . ,t m satisfy E and D}. 

A TAC is total and deterministic if for any a £ E, pi, . . . , p m 
G P, and ti, . . . , t m G Ts, there exists one unique transition 
(cr' m ',pi, . . . ,p m , E, D,p) G S such that ti,...,t m satisfies 
the constraints E and D. For a total deterministic TAC, we 
abuse the notation and denote by S(t) the unique element 
of itself. 

Note that, as well as a normal bottom-up tree automaton, 
we can run a TAC on a tree in (amortized) linear time, by 
first computing the minimal dag representation of the input 
tree; due to its minimality, the equality (or disequality) test 
of two subtrees can be carried out in constant time, by a 
single pointer comparison. Also note that total deterministic 



TACs are equally expressive as its nondeterministic version 
(as shown in Proposition 4.2 of [T] by a variant of usual 
powerset construction). Hence, we adopt total deterministic 
TACs as our look-ahead model for mtts, without sacrificing 
the expressiveness. 

Definition 3. An mtt with TAC look-ahead is a tuple M = 
(Q, qo, E, A, R, B) where B = (P, E, 8) is a total and deter- 
ministic TAC, and all other components are defined as for 
mtts, except that the form of rules are as follows: 

{q,a(xi, . . .,x k )){yi, . . .,y m ) -> r (pi, . . . ,p k , E, D). 

The set of right-hand side of all rules of such form is denoted 
by Rq,cr,pi,...,p k ,E,D- The size \M\ is defined as for normal 
mtts. 

The semantics of mtts with TAC look-ahead differs from 
normal mtts only in the side-condition of state application, 
which is defined as follows: 

l(q, <t(si, . . . ,S k ))(ui,. . . .Mm)]" = 

U (jr[xi/si,...,x k /s k ]]™ «- ( [«i ] Jum]^)) 

where R' = Rg ta> g Mt ..j (ah)iE , D such that 
si, . . . , s k satisfies E and D. 

In a word, rules in Rq : a, vl ,..., Pk ,E,D are used when the state 
q is applied to a node satisfying all the following three con- 
ditions: (1) labeled a, (2) the child subtrees si, . . . , s k of the 
node satisfy the constraints E and D, and (3) 5(si) = pi for 
all i. 

Mtts with TAC look-ahead are strictly more expressive than 
normal mtts. For example, the translation {(n(s,s),e) \ s G 
Te} where n is a symbol of rank 2 and e is of rank 0, can 
be done by a transducer with TAC look-ahead. But no mtt- 
composition can realize this translation because the domain 
is not regular (by Corollary 5.6 of [6j, the domain of any mtt 
must be a regular tree language). Nevertheless, the PTIME 
translation membership for MTTio can be extended to mtts 
with TAC look-ahead. 

Theorem 6 Let M be an mtt with TAC look-ahead. Trans- 
lation membership for tio.aj can be determined in time 
0(\s\ ■ \t\ 2m+2 ■ \M\) where m is the maximum rank of M's 
states. 

Proof. The basic idea is again the on-the-fly construc- 
tion of the inverse-type automaton, but this time, to deal 
with the look-ahead, we run parallely the look-ahead au- 
tomaton. 

Let Sdag be the minimal dag representation of s, which can 
be computed in 0(|s|) time. As explained before, the equal- 
ity (or disequality) test of two subtrees of s dag Ceill be carried 
out in constant time. Let V s be the set of nodes of Sdag. Let 
Vt be the set of nodes of tdag and V = Vt U {J-}. The func- 
tions label(v), child(v, i), and p(t) are defined as in the proof 
of Theorem [4] 



Let A = 2^ Q{i)yviyV and run : T s V s X P X A (note 
the difference of the return value of run, compared to that 
in Theorem 2| be the function defined as follows 

run(s') = tr(s' , a, run(si), . . . , run(s k )) 

with s' — (j(si, . . . , s k ) 

where the function tr is: 

tr(s', a, {si, pi, ai), . . . , (s k ,p k ,a k )) = 
(s',6(s), 

{(q,v,v) e |JQ W x V 1 x V\ 3r e R q ,^ Pl ,..., Ph , E ,D ■ 

i 

(si, . . . , Sfc) satisfies E, D and /^.s(r, ■ 
The definition of /^a remains the same as in Theorem [4] 

The look-ahead state S(s') can be computed from a, pi, ... , 
Pk, and si, . . . , Sk in constant time. By the same argument 
as the case of normal mtts, we obtain the 0(\s\ ■ \t\ 2m+2 ■ 
\M\) time complexity. The correctness of the construction 
is proved also in the same way as for normal mtts. That 
is, we can prove the following claim by nested induction on 
structure of s', and then on the structure of right-hand sides 
of the rules. 

Claim For every input tree s' , we have the following equa- 
tion for all q G Q, n G lAu(<3xr s )uri an d an environment 

F: p(l(q,s')(ri,...,r n )j^) = {»' | (q,(vi,...,v n ),v') G 

run(s'),Vi G pdnlfo) for all ij 

Again, applying the claim to p([(?o, s)]io), we know that 
the translation membership is equivalent to (qo,(),v t ) G 
run(s) where v e is the root node of tdag- Hence, the trans- 
lation membership can be determined by computing the set 
run(s). □ 

Another extension of mtts that admits a polynomial time 
translation membership is multi-return mtts (mr-mtts) [9| 
IIP] , In an mr-mtt, states may return multiple trees (with 
the initial state returning exactly one tree). Mr-mtts are 
strictly more expressive than normal mtts, and furthermore, 
have better closure properties under composition with top- 
down tree transducers [ID] . 

Definition 4- A multi-return macro tree transducer (mtt) 
M is a tuple (Q, E, A, go, R, D), where Q, E, A, and qo are 
defined as for mtts, D : Q — > N is the dimension such that 
D(qo) = 1, and R is the finite set of rules of the form 

(g,cr(a;i, . . .,x k ))(yi, . . . ,y m ) -> r 

where q G Q (m) , a G E (fe) , and r G rhs^ 9) where for e > 1 
and a set Q, the set rhsfy is defined as: 

r ::= h . . . l n (ui,..., u e ) (n > 0) 

I ::= let (zj+i, . . . ,z j+D(ql) ) = (q ,Xi)(ui, . . . ,u„) in 

(j G N, q G Q in) ,Xi G W) 

with m,U2,... G TAuY m uz- We usually omit parentheses 
around tuples of size one, i.e., write like let z 3 ■= •••inui. 



We require any rule to be well-formed, that is, the leftmost 
occurrence of any variable Zi must appear at a "binding" 
position (between 'let' and '='), and the next occurrence (if 
any) must appear after the 'in' corresponding to the bind- 
ing occurrence. The set of right-hand sides of such rules is 
denoted by R q , a . The size \M\ of the mr-mtt is defined to 
be the sum of the size of right-hand sides, i.e., the number 
of S, Y, Z, and Q x X nodes. 



The IO-semantics of mr-mtts is inductively denned as fol- 



lows. For u £ Tal 



C Tauyuz is 



(iti, . . . ,«k)Jio = {6(ti, 



U e [w»]io for all i} 



{*i} 



and for n € rhs^, [k]^ C T Auyuz is 



[(ttx,...,Ue)]|g = {(*!,...,*« 

pet (zi, ...,Zd) = (q, ■ • ■ 
= ^£,[zi/t 1 ,...,z d /t d ] | 



ii € Jit, 
,«m))(Wi, 



lio for all i} 
. . . ,u fe ) in K'lfg 



(lr[xi/si,. . ■ ,x m /s m ]]io ^ (bilfo, • ■ • , Mio))}- 
The translation Tio.m C Te x Ta realized by M is the set 
{(s,i)|tGllet2 = (g ,s}inz]]}fg. 

Here is an example of an mr-mtt, which is used in [9] as a 
counterexample that cannot be realized in normal mtts: 

(q ,s(x))Q — * let (21,22) = (gi, x)(A(E)) in r(a(zi), z 2 ) 

(qo,s(x))() — * let (21,22) = (gi, x)(B(E)) in r(b(zi), 22) 

<go,z)()^r(e,E) 

(?i,s(x))(y 8 ) -> let (21,22) = (gi,x)(A(y 2 )) in (a(z 1 ),z 2 ) 

(gi, s(x))(j/ 2 ) -> let (21,22) = (gi, x)(B(y 2 )) in (b(zi),z 2 ) 

(9i,z}(2/2) -» (e,y 2 ) 

This nondeterministic translation takes as input monadic 
trees of the form s(s(- ■ ■ s(z) • ■ • )) and produces output trees 
of the form r(ii,t2) where ti is a monadic tree over a's and 
b's (and a leaf e), and t 2 is a monadic tree over A's and B's 
such that t 2 is the reverse of ti, and both have the same 
size as the input. For instance, r(a(a(b(e))), B(A(A(E)))) is 
a possible output tree for the input s(s(s(z))). Consider 
the return value of the state call [(gi, s(z))(E)J: it is the 
set {(a(E), A(E)), (b(E),B(E))} of pairs of trees. In a word, 
the state gi returns only mutually reverse pairs of monadic 
trees. This is impossible in normal mtts, in which we must 
carry out two state calls in order to obtain two output 
trees; two nondeterministic state calls are evaluated inde- 
pendently, and cannot avoid generating unrelated pairs of 
trees. 

Despite their expressive power over normal mtts, mr-mtts 
still have a similar complexity for inverse type inference. 
Therefore the translation membership remains in PTIME. 

Theorem 7 Let M be an mr-mtt. Translation membership 
for tio,m can be determined in time OQs\ ■ \t\ 2m+2d . \M\) 
where m is the maximum rank of the states and d is the 
maximum dimension. 



Proof. For mr-mtts, we take the set A of inverse-type 
automaton as A = 2 u *.j Q ttJ) *v* xv' whgre . g thg ^ 

of states q of rank(q) = i and D(q) = j. The intuition of 
the set of states A is similar to the case of normal mtts. 
That is, "(g, v, w) £ run(s')" means that "if g is applied 
to the input subtree s' with output subtrees rooted at v as 
parameters, then it may return a tuple of output subtrees 
w" . The construction is quite similar to that of the proof of 
Theorem H □ 



As a final remark we would like to mention the complex- 
ity of translation membership for deterministic mtts; it can 
be determined in linear time. Since domains of composi- 
tions of mtts are regular, we can factor out the partiality 
and have the following decomposition: for fi G {10, 01}, 
DMTT™ C FTA ; DtMTT" where FTA is the class of partial 
identities whose domain is regular (analogous to Theorem 
6.18 of [6]). Therefore, to compute the translation member- 
ship for a composition of deterministic mtts, we first check 
in 0(|s|) time whether the given input s is contained in the 
domain of the translation, and then check the translation 
membership for composition of deterministic and total mtts. 
Here, by Theorem 15 of [12], for a translation r £ DtMTT" 
we can compute the unique output tree t' € r(s) from the 
input s in time 0(\s\ + \t'\), and during the computation, 
the size of every intermediate tree is less than or equal to 
2 n ■ \t'\. Hence, for testing (s,t) 6 t, we simply compute 
t(s); if the size of any intermediate tree exceeds 2" • \t\ then 
(s, i) cannot be an element of r, and otherwise, we compare 
the computed tree r(s) with t. The time complexity of the 
above procedure is 0(\s\ +2" ■ |t|). 

Theorem 8 Let \i G {IO, OI} and n > 1. Translation 
membership for DMTT™ is in 0(\s\ + 2 n \t\). 

5. FUTURE WORK 

The complexity of the translation membership problem re- 
mains open for several interesting subclasses and extensions 
of mtts. One example is the mtt with holes |14] in 10 mode. 
Note that, similar to Theorem 4.6 of [TJ, hole-mtts in IO 
mode are equal to MTT10 ; YIELD, which is included in 
MTT10 ; LDtMTT. An algorithm based on inverse type in- 
ference does not work, because the parameter part of the 
states of the inverse-type automaton is a set of functions 
[V — * V], which is exponential in size with respect to the 
output tree \t\. On the other hand, it is not clear either 
whether it is NP-hard. Note that mtts with holes in OI 
mode can simulate all 01-mtts, and therefore their transla- 
tion membership is NP-complete. 

Another interesting class is that of 1-parameter mtts in OI 
mode. Our encoding of 3-SAT used three parameters. In 
fact, the number of parameters can be reduced to two by 
embedding the encodings of boolean variables in the input 
tree s. Can we encode 3-SAT into a 1-parameter mtt? Or, 
do 1-parameter mtts actually have PTIME translation mem- 
bership? (Again, the inverse-type automaton technique used 
in this paper for IO-mtts does not seem to work in this case, 
because the automaton gets too large.) 
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